48 research outputs found

    Static and Dynamic Scheduling for Effective Use of Multicore Systems

    Get PDF
    Multicore systems have increasingly gained importance in high performance computers. Compared to the traditional microarchitectures, multicore architectures have a simpler design, higher performance-to-area ratio, and improved power efficiency. Although the multicore architecture has various advantages, traditional parallel programming techniques do not apply to the new architecture efficiently. This dissertation addresses how to determine optimized thread schedules to improve data reuse on shared-memory multicore systems and how to seek a scalable solution to designing parallel software on both shared-memory and distributed-memory multicore systems. We propose an analytical cache model to predict the number of cache misses on the time-sharing L2 cache on a multicore processor. The model provides an insight into the impact of cache sharing and cache contention between threads. Inspired by the model, we build the framework of affinity based thread scheduling to determine optimized thread schedules to improve data reuse on all the levels in a complex memory hierarchy. The affinity based thread scheduling framework includes a model to estimate the cost of a thread schedule, which consists of three submodels: an affinity graph submodel, a memory hierarchy submodel, and a cost submodel. Based on the model, we design a hierarchical graph partitioning algorithm to determine near-optimal solutions. We have also extended the algorithm to support threads with data dependences. The algorithms are implemented and incorporated into a feedback directed optimization prototype system. The prototype system builds upon a binary instrumentation tool and can improve program performance greatly on shared-memory multicore architectures. We also study the dynamic data-availability driven scheduling approach to designing new parallel software on distributed-memory multicore architectures. We have implemented a decentralized dynamic runtime system. The design of the runtime system is focused on the scalability metric. At any time only a small portion of a task graph exists in memory. We propose an algorithm to solve data dependences without process cooperation in a distributed manner. Our experimental results demonstrate the scalability and practicality of the approach for both shared-memory and distributed-memory multicore systems. Finally, we present a scalable nonblocking topology-aware multicast scheme for distributed DAG scheduling applications

    On A Simpler and Faster Derivation of Single Use Reliability Mean and Variance for Model-Based Statistical Testing

    Get PDF
    Markov chain usage-based statistical testing has proved sound and effective in providing audit trails of evidence in certifying software-intensive systems. The system end-toend reliability is derived analytically in closed form, following an arc-based Bayesian model. System reliability is represented by an important statistic called single use reliability, and defined as the probability of a randomly selected use being successful. This paper continues our earlier work on a simpler and faster derivation of the single use reliability mean, and proposes a new derivation of the single use reliability variance by applying a well-known theorem and eliminating the need to compute the second moments of arc failure probabilities. Our new results complete a new analysis that could be shown to be simpler, faster, and more direct while also rendering a more intuitive explanation. Our new theory is illustrated with three simple Markov chain usage models with manual derivations and experimental results

    A Real-Time Machine Learning and Visualization Framework for Scientific Workflows

    Get PDF
    High-performance computing resources are currently widely used in science and engineering areas. Typical post-hoc approaches use persistent storage to save produced data from simulation, thus reading from storage to memory is required for data analysis tasks. For large-scale scientific simulations, such I/O operation will produce significant overhead. In-situ/in-transit approaches bypass I/O by accessing and processing in-memory simulation results directly, which suggests simulations and analysis applications should be more closely coupled. This paper constructs a flexible and extensible framework to connect scientific simulations with multi-steps machine learning processes and in-situ visualization tools, thus providing plugged-in analysis and visualization functionality over complex workflows at real time. A distributed simulation-time clustering method is proposed to detect anomalies from real turbulence flows

    Building a scientific workflow framework to enable real‐time machine learning and visualization

    Get PDF
    Nowadays, we have entered the era of big data. In the area of high performance computing, large‐scale simulations can generate huge amounts of data with potentially critical information. However, these data are usually saved in intermediate files and are not instantly visible until advanced data analytics techniques are applied after reading all simulation data from persistent storages (eg, local disks or a parallel file system). This approach puts users in a situation where they spend long time on waiting for running simulations while not knowing the status of the running job. In this paper, we build a new computational framework to couple scientific simulations with multi‐step machine learning processes and in‐situ data visualizations. We also design a new scalable simulation‐time clustering algorithm to automatically detect fluid flow anomalies. This computational framework is built upon different software components and provides plug‐in data analysis and visualization functions over complex scientific workflows. With this advanced framework, users can monitor and get real‐time notifications of special patterns or anomalies from ongoing extreme‐scale turbulent flow simulations

    Interactive 3D simulation for fluid–structure interactions using dual coupled GPUs

    Get PDF
    The scope of this work involves the integration of high-speed parallel computation with interactive, 3D visualization of the lattice-Boltzmann-based immersed boundary method for fluid–structure interaction. An NVIDIA Tesla K40c is used for the computations, while an NVIDIA Quadro K5000 is used for 3D vector field visualization. The simulation can be paused at any time step so that the vector field can be explored. The density and placement of streamlines and glyphs are adjustable by the user, while panning and zooming is controlled by the mouse. The simulation can then be resumed. Unlike most scientific applications in computational fluid dynamics where visualization is performed after the computations, our software allows for real-time visualizations of the flow fields while the computations take place. To the best of our knowledge, such a tool on GPUs for FSI does not exist. Our software can facilitate debugging, enable observation of detailed local fields of flow and deformation while computing, and expedite identification of ‘correct’ parameter combinations in parametric studies for new phenomenon. Therefore, our software is expected to shorten the ‘time to solution’ process and expedite the scientific discoveries via scientific computing

    Modeling and Implementation of an Asynchronous Approach to Integrating HPC and Big Data Analysis

    Get PDF
    With the emergence of exascale computing and big data analytics, many important scientific applications require the integration of computationally intensive modeling and simulation with data-intensive analysis to accelerate scientific discovery. In this paper, we create an analytical model to steer the optimization of the end-to-end time-to-solution for the integrated computation and data analysis. We also design and develop an intelligent data broker to efficiently intertwine the computation stage and the analysis stage to practically achieve the optimal time-to-solution predicted by the analytical model. We perform experiments on both synthetic applications and real-world computational fluid dynamics (CFD) applications. The experiments show that the analytic model exhibits an average relative error of less than 10%, and the application performance can be improved by up to 131% for the synthetic programs and by up to 78% for the real-world CFD application

    A Simpler and More Direct Derivation of System Reliability Using Markov Chain Usage Models

    Get PDF
    Markov chain usage-based statistical testing has been around for more than two decades, and proved sound and effective in providing audit trails of evidence in certifying software-intensive systems. The system end-to-end reliability is derived analytically in closed form, following an arc-based Bayesian model. System reliability is represented by an important statistic called single use reliability, and defined as the probability of a randomly selected use being successful. This paper reviews the analytical derivation of the single use reliability mean, and proposes a simpler, faster, and more direct way to compute the expected value that renders an intuitive explanation. The new derivation is illustrated with two examples

    XScan: An Integrated Tool for Understanding Open Source Community-Based Scientific Code

    Get PDF
    Many scientific communities have adopted community-based models that integrate multiple components to simulate whole system dynamics. The community software projects’ complexity, stems from the integration of multiple individual software components that were developed under different application requirements and various machine architectures, has become a challenge for effective software system understanding and continuous software development. The paper presents an integrated software toolkit called X-ray Software Scanner (in abbreviation, XScan) for a better understanding of large-scale community-based scientific codes. Our software tool provides support to quickly summarize the overall information of scientific codes, including the number of lines of code, programming languages, external library dependencies, as well as architecture-dependent parallel software features. The XScan toolkit also realizes a static software analysis component to collect detailed structural information and provides an interactive visualization and analysis of the functions. We use a large-scale community-based Earth System Model to demonstrate the workflow, functions and visualization of the toolkit. We also discuss the application of advanced graph analytics techniques to assist software modular design and component refactoring
    corecore